268 research outputs found
On the Power of Adaptivity in Sparse Recovery
The goal of (stable) sparse recovery is to recover a -sparse approximation
of a vector from linear measurements of . Specifically, the goal is
to recover such that ||x-x*||_p <= C min_{k-sparse x'} ||x-x'||_q for some
constant and norm parameters and . It is known that, for or
, this task can be accomplished using non-adaptive
measurements [CRT06] and that this bound is tight [DIPW10,FPRU10,PW11].
In this paper we show that if one is allowed to perform measurements that are
adaptive, then the number of measurements can be considerably reduced.
Specifically, for and we show - A scheme with measurements that uses
rounds. This is a significant improvement over the best possible non-adaptive
bound. - A scheme with measurements
that uses /two/ rounds. This improves over the best possible non-adaptive
bound. To the best of our knowledge, these are the first results of this type.
As an independent application, we show how to solve the problem of finding a
duplicate in a data stream of items drawn from using
bits of space and passes, improving over the best
possible space complexity achievable using a single pass.Comment: 18 pages; appearing at FOCS 201
Lower Bounds for Sparse Recovery
We consider the following k-sparse recovery problem: design an m x n matrix
A, such that for any signal x, given Ax we can efficiently recover x'
satisfying
||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1.
It is known that there exist matrices A with this property that have only O(k
log (n/k)) rows.
In this paper we show that this bound is tight. Our bound holds even for the
more general /randomized/ version of the problem, where A is a random variable
and the recovery algorithm is required to work for any fixed x with constant
probability (over A).Comment: 11 pages. Appeared at SODA 201
Stream Sampling for Frequency Cap Statistics
Unaggregated data, in streamed or distributed form, is prevalent and come
from diverse application domains which include interactions of users with web
services and IP traffic. Data elements have {\em keys} (cookies, users,
queries) and elements with different keys interleave. Analytics on such data
typically utilizes statistics stated in terms of the frequencies of keys. The
two most common statistics are {\em distinct}, which is the number of active
keys in a specified segment, and {\em sum}, which is the sum of the frequencies
of keys in the segment. Both are special cases of {\em cap} statistics, defined
as the sum of frequencies {\em capped} by a parameter , which are popular in
online advertising platforms. Aggregation by key, however, is costly, requiring
state proportional to the number of distinct keys, and therefore we are
interested in estimating these statistics or more generally, sampling the data,
without aggregation. We present a sampling framework for unaggregated data that
uses a single pass (for streams) or two passes (for distributed data) and state
proportional to the desired sample size. Our design provides the first
effective solution for general frequency cap statistics. Our -capped
samples provide estimates with tight statistical guarantees for cap statistics
with and nonnegative unbiased estimates of {\em any} monotone
non-decreasing frequency statistics. An added benefit of our unified design is
facilitating {\em multi-objective samples}, which provide estimates with
statistical guarantees for a specified set of different statistics, using a
single, smaller sample.Comment: 21 pages, 4 figures, preliminary version will appear in KDD 201
External inverse pattern matching
We consider {\sl external inverse pattern matching} problem. Given a text \t of length over an ordered alphabet , such that , and a number . The entire problem is to find a pattern \pe\in \Sigma^m which is not a subword of \t and which maximizes the sum of Hamming distances between \pe and all subwords of \t of length . We present optimal -time algorithm for the external inverse pattern matching problem which substantially improves the only known polynomial -time algorithm introduced by Amir, Apostolico and Lewenstein. Moreover we discuss a fast parallel implementation of our algorithm on the CREW PRAM model
Cross-Sender Bit-Mixing Coding
Scheduling to avoid packet collisions is a long-standing challenge in
networking, and has become even trickier in wireless networks with multiple
senders and multiple receivers. In fact, researchers have proved that even {\em
perfect} scheduling can only achieve . Here
is the number of nodes in the network, and is the {\em medium
utilization rate}. Ideally, one would hope to achieve ,
while avoiding all the complexities in scheduling. To this end, this paper
proposes {\em cross-sender bit-mixing coding} ({\em BMC}), which does not rely
on scheduling. Instead, users transmit simultaneously on suitably-chosen slots,
and the amount of overlap in different user's slots is controlled via coding.
We prove that in all possible network topologies, using BMC enables us to
achieve . We also prove that the space and time
complexities of BMC encoding/decoding are all low-order polynomials.Comment: Published in the International Conference on Information Processing
in Sensor Networks (IPSN), 201
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
Interval Selection in the Streaming Model
A set of intervals is independent when the intervals are pairwise disjoint.
In the interval selection problem we are given a set of intervals
and we want to find an independent subset of intervals of largest cardinality.
Let denote the cardinality of an optimal solution. We
discuss the estimation of in the streaming model, where we
only have one-time, sequential access to the input intervals, the endpoints of
the intervals lie in , and the amount of the memory is
constrained.
For intervals of different sizes, we provide an algorithm in the data stream
model that computes an estimate of that, with
probability at least , satisfies . For same-length
intervals, we provide another algorithm in the data stream model that computes
an estimate of that, with probability at
least , satisfies . The space used by our algorithms is bounded
by a polynomial in and . We also show that no better
estimations can be achieved using bits of storage.
We also develop new, approximate solutions to the interval selection problem,
where we want to report a feasible solution, that use
space. Our algorithms for the interval selection problem match the optimal
results by Emek, Halld{\'o}rsson and Ros{\'e}n [Space-Constrained Interval
Selection, ICALP 2012], but are much simpler.Comment: Minor correction
Lower bounds for sparse recovery
We consider the following k-sparse recovery problem:
design an m x n matrix A, such that for any signal
x, given Ax we can efficiently recover ^x satisfying
x|| ^x||1 [less than or equal to] C min[subscript k]-sparse x'||x - x'||1. It is known that there exist matrices A with this property that have only O(k log(n=k)) rows.
In this paper we show that this bound is tight.
Our bound holds even for the more general random-
ized version of the problem, where A is a random
variable, and the recovery algorithm is required to
work for any fixed x with constant probability (over
A).David & Lucile Packard FoundationDanish National Research FoundationDanish National Research Foundation (MADALGO (Center for Massive Data Algorithmics))National Science Foundation (U.S.) (grant CCF-0728645)Cisco Community Fellowship Progra
Pseudorandomness for Regular Branching Programs via Fourier Analysis
We present an explicit pseudorandom generator for oblivious, read-once,
permutation branching programs of constant width that can read their input bits
in any order. The seed length is , where is the length of the
branching program. The previous best seed length known for this model was
, which follows as a special case of a generator due to
Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of
for arbitrary branching programs of size ). Our techniques
also give seed length for general oblivious, read-once branching
programs of width , which is incomparable to the results of
Impagliazzo et al.Our pseudorandom generator is similar to the one used by
Gopalan et al. (FOCS 2012) for read-once CNFs, but the analysis is quite
different; ours is based on Fourier analysis of branching programs. In
particular, we show that an oblivious, read-once, regular branching program of
width has Fourier mass at most at level , independent of the
length of the program.Comment: RANDOM 201
- …